NEW BOUNDS FOR SZEMEREDI'S THEOREM, I: PROGRESSIONS 
OF LENGTH 4 IN FINITE FIELD GEOMETRIES 



BEN GREEN AND TERENCE TAO 



Abstract. Let k ^ 3 be an integer, and let G be a finite abelian group with \G\ = N, 
where (N, (k — 1)!) = 1. We write r k (G) for the largest cardinality \A\ of a set A C G 
which does not contain k distinct elements in arithmetic progression. 

The famous theorem of Szemeredi essentially asserts that rfc(Z/AZ) = o k (N). It is 
known, in fact, that the estimate r k (G) — o k (N) holds for all G. 

There have been many papers concerning the issue of finding quantitative bounds 
for r k (G) . A result of Bourgain states that 

r 3 (G) < 7V(loglogA71ogA) 1/2 

for all G. In this paper we obtain a similar bound for r^{G) in the particular case 
G = F n , where F is a fixed finite field with char(F) ^ 2, 3 (for example, F = F 5 ). We 
prove that 

r 4 (G) < F N{\ogN)- c 

for some absolute constant c > 0. In future papers we will treat general abelian groups 
67, eventually obtaining a comparable result for arbitrary G. 



1. Introduction 

Let G be a finite abelian group with cardinality N, written additively. Let k ^ 3 be an 
integer, and suppose that (N, (k — l)\) = 1 (or equivalently, that every non-zero element 
of G has order at least k). We define r k (G) to be the largest cardinality \A\ of a set 
A C G which does not contain an arithmetic progression (x, x + d, . . . x + (k — l)d) with 
d 7^ (such progressions will be referred to as proper). 

A deep and famous theorem of Szemeredi [29] asserts that any set of integers with 
positive upper density contains arbitrarily long arithmetic progressions. This is easily 
seen to be equivalent to the assertion that 

r k (Z/NZ) =o k (N). (1.1) 

Here o k {N) denotes a quantity which when divided by N, goes to zero as — > oo 
for each fixed k. It is known, in fact, that r k (G) = o k (N) for all G; this may be 
proved by combining Szemeredi's theorem with the density Hales- Jewett theorem [5], 
and also follows from any of the recent hypergraph regularity results (see p3j and 
subsequent papers by the same authors and [32]). When k = 3, the assertion (II. ip was 
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proved earlier by Roth [26J, who in fact obtained the quantitative bound 

r 3 (Z/NZ) < N/ log log N. (1.2) 

As usual we write X <C Y if we have the bound X $C for some absolute constant 
C > 0; if this constant C depends on some additional parameters then we will denote 
this by subscripting the <C notation appropriately. Roth's bound was improved to 

r 3 (Z/iVZ) < Ni\ogN)-\ (1.3) 

for some absolute constant c > 0, independently by Heath-Brown [20J and Szemeredi 
[28] . This bound was then further improved to 

r 3 (Z/iVZ) < A^(loglogA^/logA^) 1/2 (1.4) 

by Bourgain [2J. This is the best bound currently known. It should be compared with 
the famous conjecture of Erdos and Turan [3], which asserts that if A C N is a set of 
integers with X]aeA a 1 = °°' then A contains progressions of length k for all k. This 
statement is unknown even when k — 3, in which case it is more-or-less equivalent to 
proving that r 3 (Z/iVZ) < e N/(\ogN) 1+£ for all e > 0. 

Finding quantitative bounds for r 4 (Z/iVZ) proved to be much more difficult. Many of 
the known proofs that r 4 (Z/iVZ) = o(N), such as [U Q21 E3 E7J ESI ED [33] , give very 
weak bounds or no explicit bounds at all. It was a great advance when Gowers [HI ID] 
proved that 

r 4 (Z/iVZ) < iV(loglogiV)~ c 

for some absolute constant c > 0. This is the best bound currently known. Our goal 
in this paper and the next two in the series is to bring our knowledge concerning r 4 (G) 
into line with that concerning r 3 . 

The arguments of Roth, Heath-Brown, Szemeredi and Bourgain can all (with varying 
degrees of effort) be adapted to give bounds for r 3 (G) of the same strength as (11. 2p . 
(11 . 3 p and (11.41) above for a general G. It was observed in [21] that Roth's argument is 
particularly simple when G = F3. In fact in this setting all four of the arguments of 
[21 [23 12S1 [2H] are essentially the same and give the bound 

r 3 (F") < N/ log N. 

This idea of looking at finite field models for additive problems has proved very fruitful. 
The chief reason for its success is that arguments of linear algebra are available in the 
finite field setting, but not in general groups (see the survey [H] for more information). 

Our main theorem in this paper is 

Theorem 1.1. Let F be a fixed finite field with char(F) 7^ 2, 3. Let G = F n , and write 
N := |-F| n . Then we have the bound 

r 4 (G)«iV(log |F| iV)- c , 

for some absolute constant c > (in fact one can take c = 2 -21 ). The implied constant 
is absolute. 

Remark. One might perhaps keep in mind the example F = F 5 . 
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This paper, like all previous papers obtaining quantitative bounds for rk{G), uses the 
density increment strategy. See [H] for a general discussion of this strategy, and the 
book P3J for proofs of (|1.2p . f 1 1.3 1) and (jl.4p . Gowers [SI E] obtained his bound using 
this strategy and some quadratic Fourier analysis. Indeed, his bound may be described 
as the quadratic version of Roth's argument for r^. In [16] we obtained the bound 



by an elaboration of the same method. The argument for G = Fg, which is rather 
simpler than the general case, may be found in §7 of that paper and contains some of 
the ideas which will be important later on. 

The first step of that argument, and indeed the one of the main results of [16J, was an 
inverse theorem for the Gowers U 3 (¥2) norm. Combined with a so-called generalized 
von Neumann theorem, this implies a certain very useful dichotomy. Let A C F£ be a 
set with density a. Then either A contains roughly a A N 2 progressions of length four 
(and hence at least one non-trivial progression) or A has density at least a + c(a) on 
some set of the form {x G F5 : q(x) = A}, where q : FI? — > F5 is a quadratic form, and 
c(a) > is an explicit positive quantity depending only on a. 

The next step is to linearize the level set {q(x) = A}, covering it by cosets of a subspace 
of dimension about n/2. A must have density at least a + c(a)/2 on at least one of 
these, and this gives the basis for a density increment argument. 

Linearization is very costly, and is the chief reason that the bound in (11. 5ft contains 
an iterated logarithm. One way of avoiding linearization would be to work the whole 
argument on joint level sets ("quadratic submanifolds" ) 



obtaining density increments on successive sets of this type (with d increasing at each 
stage). Obtaining the relevant U 3 inverse and generalized von Neumann theorems turns 
out to be very troublesome, though it can be done; we hope to report further on this 
strategy in a future paper. 

In this paper we adopt a compromise approach, which may be thought of as the qua- 
dratic analogue of the Heath-Brown and Szemeredi bound for r%. Very roughly, we prove 
that either A has roughly a A N 2 four-term APs or there are some quadratics qi, ■ ■ ■ ,qd 
such that A has density at least a + c'(a) on a quadratic submanifold such as (11.61) . Here 
c'(a) is to be thought of as rather larger than c(a). Only now do we linearize, covering 
the quadratic submanifold by cosets of some subspace of (it turns out) dimension about 
n/(d + 1). Note that if we linearized the quadratics one at a time we would pass to a 
subspace of dimension n/2 d . The relative efficiency of linearizing several quadratics at 
a time, together with the larger density increment c'(a), is what leads to the improved 
bound of Theorem 11.11 



n(G) < iV(loglogiV)- c 



(1.5) 



{x : q x {x) = Ai, q 2 (x) = A 2 , . . . , q d (x) = \ d }, 



(1.6) 



We are indebted to Timothy Gowers for inspiring this project, which was in fact the 
starting point for our collaboration, preceding (and leading to) such results as [15, 19J. 
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2. General notation 

Let A be a finite non-empty set and let / : A — > C be a function. We use the averaging 
notation 

E^(/)=E zGA /(*):=J-]T/(z). 

More complex expressions such as K x€ A, y eBf(x,y) are similarly defined. We also define 
the LP norms 

\\J\Wa) :=E A (\f\n 1/p 

for 1 ^ p < oo, with the usual convention H/Hl^a) := sup xgA We also use the 

complex inner product 

{f,g)L\A) ■= Ea(/?)- 
We say that / is 1- bounded if ^ 1- 

If A, B are finite sets with B non-empty, we use ¥b(A) := to denote the density of 
A in B. If A C If are finite sets, we use 1^ : W — > R to denote the indicator function 
of A, thus lyi(x) = 1 when x G A and l^x) = otherwise. We also write for 
Thus for instance f B (A) = K B (1 A ) for all non-empty B C W. 

3. AFFINE GEOMETRY 

Observe that to prove Theorem 11.11 it suffices to do so in the special case when F has 
prime order, since a vector space over a general finite field can also be interpreted as a 
vector space of equal or greater dimension over a field of prime order. 

important convention. Henceforth F will be a fixed finite field of prime order at 
least 5. Without this assumption some of our later arguments (Lemma EH for example) 
do not work properly. 

Theorem 11.11 is stated in terms of a vector space over F. It is convenient to have an 
affine perspective, so that our definitions are insensitive to the choice of origin and thus 
enjoy a translation invariance. In this section we recall some of the basic features of 
affine linear algebra. The notation here may appear somewhat excessive, but we present 
the material in this manner for pedagogical reasons, as we shall shortly be developing 
quadratic analogues of many of the concepts in this section, using the same type of 
notation. 

Definition 3.1 (Affine spaces). Let G be a linear vector space over F. We define an 
affine subspace W of G to be a translate of a linear subspace W of G by some arbitrary 
coset representative y G W. We refer to W = W — W as the homogenization of W and 
W as a coset of W; note that if x G W and h G W then x + h G W. Two affine spaces 
are said to be parallel if they have the same homogenization. We define the dimension 
dim(W) of an affine space to be the dimension of its homogenization, and if W is an 
affine subspace of another affine space W we refer to the quantity dim(W') — dim(VF) 
as the codimension of W in W . 
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Definition 3.2 (Linear phase function). Let W be an affine space. An (affine-) linear 
phase function on W is any map : W — > M/Z to the circle group M/Z (which we view 
additively) such that 

(f)(x + h ± + h 2 ) - <f){x + h x ) - <j>(x + h 2 ) + <j>{x) = 

for all x G W and hi, h 2 G W. 

For any finite additive group G, define the Pontryagin dual G of G to be the space of 
group homomorphisms £ : x i— > £ • x from G to R/Z. Given a linear phase function 

on W^, we can define its gradient vector V0 &W hy requiring the Taylor expansion 

(f)(x + h) = (f>(x) +V(f)-h 

for all x G W and h G W; it is easy to verify that V0 is well defined. Also observe that 
if is a linear phase function on W, then takes at most |F| values, and the level sets 
of are parallel affine spaces of codimension at most 1 in W. 

Every linear phase on W defines an affine character e(0) : W — > C, where e : R/Z — > 
C is the standard homomorphism e(x) := e 27n:r . These characters could be used to 
develop an "affine-linear Fourier analysis". However it will be more convenient to use 
traditional (non-afline) linear Fourier analysis. Namely, if G is a linear vector space and 
/ : G — > C is a function, we define the Fourier transform / : G — > C by the formula 

/(0 :=E^ 6G /(x)e(-e-x). 
Of course we have the Fourier inversion formula 

and the Plancherel identity 

ll/ll! 2 ( G) :=E G |/| 2 = ^|/(0| 2 . 

4. The form A and the U 3 (W) norm 

Let us say that four affine spaces Wo, Wi, W 2 , W% in a common ambient space W are in 
arithmetic progression if they are parallel with common homogeneous space W, and if 
they form an arithmetic progression in the quotient space W'/W, or equivalently if there 
exist x G W and h G W' such that Wj = x + jh + W for all j = 0, 1, 2, 3. In particular 
for any affine space W, the quadruple W, W, W, W is in arithmetic progression. 

In the Fourier- analytic or ergodic approaches to counting progressions of length 4, a 
fundamental role is played by the quadrilinear form A.w ,Wi,W2,w 3 (fo, /i> /2, ^3), defined 
for four affine spaces W , W\, W 2 , W 3 in arithmetic progression together with functions 

f . - • :. I,y 

Awo,vfi,W2,vf 3 (/o, fi, f2, /3J := Ea; e vKo,/iGVKi-VKo/o(^)/i(^ + h)f 2 (x + 2h)f 3 (x + 3/l). 
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We shall abbreviate ^w ,Wi,W2,w 3 as A when the spaces W , W±, W 2 , W 3 are clear from 
context (in particular, the spaces W , W±, W 2 , W% will usually be equal). This quantity 
is clearly related to the number of arithmetic progressions of length 4 in a set A C W. 
In particular, if A has no proper progressions of length four, then it is easy to see that 

AwwwA 1 * 1a, 1a, 1a) = F W {A)/\W\. (4.1) 



We can now describe our "density increment" step, which (as we shall shortly see) easily 
implies Theorem 11.11 upon iteration. 

Theorem 4.1 (Anomalous number of AP4s implies density increment). Let W be an 

affine space, and let f : W — > K. be a 1-bounded non-negative function (thus ^ f(x) ^ 
1 for all x G W). Set 5 := K w (f). Suppose that 

\^w,w,w,w(f, f, f, f) ~ ^w,w,w,w(S, 5, S,5)\ > 5 A /2. (4.2) 
Let C\ := 2 20 . Then at least one of the following two statements hold: 

• (medium-sized density increment on large space) There exists an affine subspace 
W of W with dimension satisfying 

dim(W') > dim(W) - (2/5) Cl (4.3) 

such that we have the density increment 

E w ,(f)^6 + 2~ 44 6 16 . (4.4) 

• (Large density increment on medium-sized space) There exists an integer K , 
1 ^ K ^ 2 33 £ -12 , and an affine subspace W ofW with dimension satisfying 

dim(W) ^ — — : - dim(W) - (2/5) Cl (4.5) 
K + 1 

such that we have the density increment 

E w ,(f) ^5(l + 2- 15 K 1/3 ). (4.6) 



Proof of Theorem ] 1. II assuming Theorem ^TT, It suffices to prove the following claim, 
which is more-or-less equivalent to Theorem 11.11 

Claim. Let A be a subset of an affine space W with Pvy(v4) = 5. If 

dim(W) ^ (2/5) c \ 

where G 2 '■= 2 21 , then A contains a proper arithmetic progression of length four. 

We prove this by induction on dim(VT). This induction may alternatively be viewed as 
an iterated application of Theorem 14.11 We may assume that dim(W) ^ 2° 2 since the 
claim is vacuous otherwise. This provides a start for the induction. Let / := 1a, so that 
5 = Kw(f)- Supposing for a contradiction that A does not contain proper arithmetic 
progressions of length four, we see from (14.11) that 

Aw,w,w,w(f, f, f, f) = 6/\F\ d ™W <: 6/5 WS)^ < 5 4 /2; 

and thus (14.21) holds. Applying Theorem 14.1} we conclude that either 
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(i) there is a medium-sized density increment on a large space, meaning that (14.31) 
and hold, or 

(ii) there is a large density increment on a medium-sized space, meaning that (14.51) 
and ( 14. 6 p both hold for some parameter K, 1 ^ K ^ 2 33 5~ 12 . 

Suppose that (i) holds. From (14.41) and the fact that ~Kwf = 8 we see that dim(W) < 
dim(VK). Applying the induction hypothesis, we see that it suffices to check that 

(2/df 2 - (2/6) Cl ^ (2/(5 + 2~ 4 V 6 )) C2 , 

or in other words that 

(1 + 2- M 5 15 )-° 2 ^ 1 - (5/2f 2 - Cl . 

Using the inequality (1 + x)~ a ^ 1 — ax + ~a(a + l)x 2 , valid for x, a ^ 0, this is easy to 
verify since Ci — C\ and Ci are so large. 

Suppose alternatively that (ii) holds. Applying the induction hypothesis once more, it 
suffices to show that 

(2/8f* - (2/6) Cl ^ (2/5(1 + 2~ 14 K 1/3 )f\ 



K + l 
or in other words that 



1 1 f5 x ° 2 ~ Cl 



(1 + 2-14^/3)^ ^ K+l V 2 , 
which is again easy to verify since C 2 , C 2 — C\ are so large. □ 

Much of the material in subsequent sections revolves around the estimation of A in 
various ways. Let us present just two simple estimates here. If fo, f\, f2, f3 are 1- 
bounded functions on an arithmetic progression of affine spaces Wo, Wi, W2, W3, then 
we have 

\^w ,Wi,w 2 ,w 3 (fo, fi, h, fz)\ < mm WfjWLHWj)- (4.7) 

This follows immediately from applying the change of variables x 1— > x — jh followed by 
the triangle inequality. It leads to an easy consequence: 

Lemma 4.2 (L 1 controls A). Let f , fi, f 2 , f3, go, 9i, 92, ^3 : W — > C be functions on an 
affine space W which are all uniformly bounded by some a > 0. Then we have 

\^w,w,w,w(fo, /ij/2,/3) - ^w,w,w,w(9o, 9i, 92, 93)\ ^ 4a 3 sup - gi\\^(w)- 

Proof. By dividing fi and gi by a we may normalize so that a = 1. We abbreviate 
^w,w,w,w as A. The claim then follows from the telescoping identity 

A(/ , h, h, fs) - H9o, 9i, 92, 93) = A(/ - g , gi, g 2 , 93) + Hfo, fi - 9i, 92, #3) 

+ Hfo, fi, h ~ 92, 93) + A(/ , fi, f2, fs ~ 9s) 

(4.8) 

and (JUTD- □ 
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The next lemma involves, for the first time in this paper, the Gowers U 3 -norm on W 
(cf. [HI EH ES]). If / : W -»• C is a function, recall that 

\\f\\ 8 uHw) := Kew-MMMew(f(x)f(x + hi)f(x + h 2 )f(x + h 3 )f(x + h + h 2 ) x 

xf(x + h 2 + h 3 )f(x + h + h 3 )f(x + h 1 + h 2 + h 3 )). 

The Gowers [7 3 -norm measures the extent to which / behaves "quadratically" . Note for 
example that if f(x) = cu^ x \ where <p '■ F n — > F is a quadratic form and u = e(2wi/\F\), 
then ||/||[/3 — 1, the largest possible U 3 norm of a 1-bounded function. The [7 3 -norm 
also controls progressions of length four in a sense to be made precise in Lemma 14.31 
below. There are also Gowers f/ d -norms for d = 2,3, . . . , with the [7 d -norm controlling 
progressions of length d+1. Properties of the Gowers norms may be found in [51 fTU| [T3] : 
the paper [TO] and book [33] provide a comprehensive discussion of the [7 3 -norm. For 
a discussion in an ergodic theory context see [22] • Previous papers only consider the 
Gowers norms on abelian groups, but the generalization to affine spaces is a triviality. 

Lemma 4.3 (Generalized von Neumann). Suppose that fo, fi, f 2 , f 3 '■ W — » C are 
1-bounded functions. Then we have 

\^w,w,w,w(fo, fi, f2, /3)| ^ min ||/i||c/3(iy)- 

In fact more generally if Wo, Wi, W 2 , W 3 are in arithmetic progression and if fi : Wi — > C 
are functions then we have 

\A-Wo,Wi,W2,Ws(foi A; A) /3J ^ n mi n \\fi \\U 3 (Wi)- □ 

Remarks. The first statement is p21 Proposition 1.7], and is proved in §4 of that paper 
using three applications of the Cauchy-Schwarz inequality. Versions of this inequality 
appear in several earlier works also, such as [8j. The second statement may be proved 
in the same way, with trivial notational changes. 

Using the telescoping identity (14.81) , we conclude the following variant of Lemma 14.21 

Lemma 4.4 (U 3 controls A). Let f,g:W^Cbe 1-bounded functions on an affine 
space W . Then we have 

\A-w,w,w,w(f, f, f, f) - ^w,w,w,w(g,g,g,g)\ < M\f - gWu^w)- 

Lemmas 14.21 and 14.41 show that errors with small L 1 or U 3 norm are negligible for the 
purposes of counting progressions of length 4. We understand the L 1 norm very well, 
but the U 3 norm is far more mysterious. For us, a key tool in its study will be the 
inverse theorem of [16J, providing a description of those / for which ||/||(7 3 is large. 



5. The inverse U 3 (W) theorem 



We begin with some notation. 



LENGTH 4 PROGRESSIONS IN FINITE FIELD GEOMETRIES 



9 



Definition 5.1 (Quadratic phase function). Let W be an affine space. An (affine) 
quadratic phase function on W is any map <ft '■ W — > R/Z such that 

<j)(x + hi + h 2 + h 3 ) - cj)(x + hi + h 2 ) - (f)(x + h 2 + h 3 ) - (j)(x + hi + h 3 ) 
+ <f>(x + hi) + <j>(x + h 2 ) + (j>{x + h 3 ) - (f)(x) = 

for all x G W and hi, h 2 , h 3 G W. 

Let us make the trivial remark that the translation of a quadratic phase function remains 
quadratic (even if one also translates the underlying space W), and the restriction of 
a quadratic phase function to an affine subspace remains quadratic. Also, every linear 
phase function is automatically quadratic. An example of a quadratic phase function 
on G = F n is <fi{x) = v{Mx ■ x + £ ■ x) + c, where M : F n — > F n is a self-adjoint 
linear transformation, £ G G, c G R/Z and v : F — > R/Z is some fixed homomorphism 
of additive groups. In fact (as we shall see) every quadratic phase can be written 
explicitly in this way. 

Quadratic phases are closely tied to the U 3 (W) norm; indeed one can easily verify that 
a 1-bounded function / : W — > C has U 3 (W) norm bounded by 1, with equality holding 
if and only if / = e(0) for some quadratic phase <fi : W — > R/Z. A more quantitative 
version of this fact is as follows. 

Theorem 5.2 (Inverse theorem for U 3 (W)). [TBI Theorem 2.3] Let f : W — > C be a 
1-bounded function on an affine space W such that \\f\\u 3 (w) ^ V f° r some < 77 ^ 1. 
Let Cq := 2 16 . Then there exists a linear subspace W of W of codimension at most 
(2/r]) c ° such that for each coset W of W , there exists a quadratic phase function 
(j) W > ■ W -> R/Z such that 

^ wleW/W ,\^ xeW ,f{x)e{-4> w ,{x))\ > (77/2)°°. (5.1) 

Remarks. The result in jT6j is phrased for vector spaces over F rather than affine spaces 
but the extension to the affine case is a triviality. Also the averaging in [T6j is over coset 
representatives rather than actual cosets but the two are related by an easy application 
of the pigeonhole principle. There is a corresponding theorem in arbitrary finite groups 
G but it is somewhat more complicated in that the subspace W needs to be replaced 
by a Bohr set: see [16j. An analogue of this result also holds in the characteristic 2 case 
(Samorodnitsky, private communication) but we will not need it here. 

In [16] it was conjectured that one could in fact take W = W (possibly at the cost 
of deteriorating the constant Cq = 2 16 ), in which case this inverse theorem takes a 
particularly simple forrrfl: if the U 3 (G) norm of / is large, then / has large correlation 
with e(0) for some quadratic phase : W — ► R/Z. This would simplify the arguments 
in this paper somewhat, though in practice it is relatively inexpensive to pass from W 
to the slightly smaller space W as necessary. 



One can use a simple Fourier averaging, combined with a certain "quadratic extension theorem" 
to achieve a version of this, but with bounds that deteriorate exponentially in 77, see |16j . If we wish 
to prove Theorem ll.il we cannot afford such exponential losses and so will not use this fact. 
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As mentioned in the introduction, one can use this theorem (together with Corollary 
17.41 below) to run a density increment argument. This yields a weak version of Theorem 
14.11 giving a bound of the form r 4 (F n ) < f iV(log log N)~ c for some c > 0: see [El 
§7] for the details. We will however take a more efficient route involving the energy 
increment argument from [15], motivated both by considerations from ergodic theory 
(notably Furstenberg's ergodic proof [U E] of Szemeredi's theorem) and from regularity 
lemmas from graph theory (in particular Szemeredi's regularity lemma [29], as well as 
an arithmetic analogue of the first author [13J ) . 

6. Linear and quadratic factors, and a quadratic Koopman-von 

Neumann theorem 

To convert the inverse theorem to a quadratic structure theorem (or quadratic Koopman- 
von Neumann theorem) we need some concepts from ergodic theory. Note, however, 
that in our context all of these constructions are purely finitary. 

Definition 6.1 (Factors). Let W be an arbitrary finite non-empty set (typically W will 
be an affine space). Define a factor (or a -algebra) in W to be a collection B of subsets 
of W which are closed under union, intersection, and complement, and which contains 
and W. Define an atom of B to be a minimal non-empty element of £>; these partition 
W, and indeed in this finitary setting the set of factors can be placed in one-to-one 
correspondence with the set of partitions of W. A function / : W —>■ X with arbitrary 
finite range X is said to be measurable with respect to B, or immeasurable for short, if 
all its level sets lie in B. We let Bf denote the factor generated by /, thus the atoms 
of Bf are precisely the non-empty level sets of /. We say that one factor B' extends 
another B if B C £>'; we also say that B is a factor of B' in this case. If W is any 
non-empty subset of W and B is a factor in W, we define the restriction B\w' of B to 
W to be the factor in W formed by intersecting all the sets in B with W . Note that 
this is a subset of B if W G B. If B, B' are factors in W we let B V B' be the smallest 
common extension (thus the atoms of B V B' are the intersections of atoms of B and 
atoms of £>'). If / : W — > C, we let E(/|£>) : W — > C denote the conditional expectation 

E(f\B)(x) := E(f\B(x)) for all x E W, 

where B(x) is the unique atom in B that contains x. Equivalently, K(f\B) is the or- 
thogonal projection (in the Hilbert space L 2 (W)) of / to the space of B- measurable 
functions. 

We will focus our attention on very structured factors, namely linear and quadratic 
factors, which turn out in the finite field setting to be the only factors required to 
analyze progressions of length four. 

Definition 6.2 (Linear factors). Let W be an affine space. A linear factor of complexity 
at most d is any factor B in W of the form B = B^ V ... V B^ d , , where ^ d' ^ d and 
0i, . . . , <pd are linear phase functions on W. 

Observe that if B is a linear factor of complexity at most d, then the atoms of B are 
parallel affine spaces of codimension at most d. Also, if B and B' are linear factors of 
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complexity at most d, d', then B V B' is also a linear factor, with complexity at most 
d + d'. 

Definition 6.3 (Quadratic factors). Let W be an affine space. A pure quadratic factor 
of complexity at most d is any factor B in W of the form B = B^ V ... V , , where 
^ d' ^ d and <pi, . . . , <f>d are quadratic phase functions on W. A quadratic factor of 
complexity at most (di, c^) is any pair (£>i, B<i) of factors in W, where B\ is a linear factor 
of complexity at most d\, and £>2 is an extension of £>i, whose restriction to any atom 
of B\ is a pure quadratic factor of complexity at most di- We say that one quadratic 
factor (B[, B' 2 ) is a quadratic extension of another (£>i, B2) if Si C B[ and £>2 C B' 2 . 

Remark. Note that a linear factor of complexity di can have as many as \F\ dl atoms, and 
thus a quadratic factor of complexity (di, d 2 ) can involve as many as \F\ dl d 2 quadratics 
(on up to \F\ dl different domains), though it is quite likely that an improved version 
of the inverse theorem in [16J would reduce the number of quadratics involved here. 
Some care must be taken to avoid this exponential dependence on the complexity from 
destroying the polynomial nature of many of the quantities in our arguments. Fortu- 
nately, by working "locally" on linear atoms rather than globally on all of W one can 
avoid any unpleasant factors of \F\ 1 in our analysis. 

Observe that if (£>i, B 2 ) and (B[, B' 2 ) are quadratic factors of complexity at most (di, d 2 ) 
and (d'i, d' 2 ) respectively, then their common extension (£>i V B[, B 2 V B 2 ) is a quadratic 
factor of complexity at most (di + d[, di + d 2 ); this is because the restriction of a pure 
quadratic factor to an affine subspace remains a pure quadratic factor of equal or lesser 
complexity. 

The inverse theorem, Theorem 15.21 can now be rephrased in terms of quadratic factors 
as follows. 

Theorem 6.4 (Inverse theorem for U 3 (W), again). Let f : W — > C be a 1-bounded 
function on an affine space W such that \\f\\u 3 (w) ^ V f or some rj, < r] ^ 1. Then 
there exists a quadratic factor (B\, B2) in W of complexity at most ((2/r]) Co , 1) such that 

\\E(f\B 2 )\\ LHw) ^( V /2)^. 
This has the following consequence. 

Corollary 6.5 (Lack of relative uniformity implies energy increment). Let (Bi, B 2 ) be a 
quadratic factor of complexity at most d 2 ) in an affine space W , and let f : W — » M + 
be a 1-bounded non-negative function such that \\f — K(f\B2)\\u 3 (w) ^ V f or some n, 
< f] ^ 1. There exists a quadratic extension (B[,B' 2 ) of(B\,B2) of complexity at most 
(di + (2/rj) c °,d2 + 1) such that we have the energy increment 

\\W\B' 2 )\\ >iie(/ib 2 ) \\i HW) + (vm 2Co . 
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Proof. Applying Theorem 16.41 to the l-bounded3 function / — E(f\B 2 ), we can find a 
quadratic factor (£>i,£> 2 ) of complexity at most ((2/r]) Co , 1) such that 

\\E(f - E(f\B 2 )\B 2 )\\ LHw) ^ ( V /2)°\ 

If we then let B[ := B\ V B\ and B' 2 := B 2 V B 2 , then by Pythagoras' theorem, the 
inclusions B 2 ,B 2 C B' 2 , and Cauchy-Schwarz, we have 

= \\E(f-E(f\B 2 )\B f 2 )\\ 2 L2(w) 
>\\E(f-E(f\B 2 )\B 2 )\\ 
>\\E(f-E(f\B 2 )\B 2 )\\l 1(w) 
^ (r//2) 2Co . 

The claim follows. □ 

We can employ this corollary repeatedly. This "energy increment argument" allows 
us to deduce one of our most important tools, a (Quadratic) Koopman-von Neumann 
theorem^. 

Theorem 6.6 (Quadratic Koopman-von Neumann theorem). Let f : W — > C be a 
1-bounded non-negative function on an affine space W , and letrj > 0. Then there exists 
a quadratic factor (Bi, B 2 ) in W of complexity at most ((2/i]) 3Co , (2/r]) 2Co ) such that 

\\f-E(f\B 2 )\\ u3{w) ^r). (6.1) 

Proof. Start with (£>i,£> 2 ) = ({0, W}, {0, W}), which is a quadratic factor of com- 
plexity (0,0). If (16.11) holds then we are done. Otherwise, we may apply Corollary 
16.51 to extend (Bi,B 2 ) to a quadratic factor with complexity incremented by at most 
((2/r]) Co , 1) and the energy ||E(/|£? 2 )||! 2(w q incremented by at least (7]/2) 2Co . On the 
other hand, since / is 1-bounded, the energy ||E(/|i3 2 ) H^^) is positive and at most 1. 
Thus we cannot iterate the above procedure more than (2/ri) 2C ° times before terminat- 
ing. The claim follows. □ 



Applying this result for n := 5 /16 and then using Lemma |4.4[ we conclude 

Corollary 6.7 (Too few AP4s on a quadratic factor). Let W be an affine space, and 
let f : W — » K be a 1-bounded non-negative function. Set 5 := E w (f). Suppose that 

\A-w,w,w,w(f, f, f, f) — A*w,w,w,w{S, S,5,S)\ > 5 /2. 



2 Here we are using the hypothesis that / is non- negative and bounded by 1 to ensure that ||/ — 
E(/|B2)||oo ^ 1- More generally we would replace the 1 on the right-hand side by a 2, which has a 
negligible impact on the argument. 

3 This theorem decomposes a function / orthogonally into a "quadratically almost periodic" com- 
ponent E(/|2?2) and a "quadratically mixing" component / — E(/|Z?2). This can be compared with 
the ordinary (linear) Koopman-von Neumann theorem in (infinitary) ergodic theory, which splits a 
function / orthogonally into an almost periodic component and a weakly mixing component. See also 
the analysis of characteristic factors for the Gowers norms and for multiple recurrence in [TJ 1221 134j . 
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Then there exists a quadratic factor (B±, B 2 ) in W of complexity at most (dj, d 2 ), where 
di := (32/5 4 ) 3C ° and d 2 := (32/5 4 ) 2C °, such that the function g := E(f\B 2 ) obeys 

\Aw,w,wM9,9,9,g) - ^w,w,w,w{^8,8,8)\ > S 4 /A. 

The factor (Bi,B 2 ) is closely related to the ergodic theory concept of a characteristic 
factor for the problem of obtaining 4-term recurrence for /. The corollary thus replaces 
the study of / (which could essentially be an arbitrary function) by a much lower 
complexity object g, which in principle can be described explicitly using a bounded 
number of linear and quadratic phase functions (cf. [T]). It will be the first component 
of the proof of Theorem 14. 11 

Of course, it remains to understand g, and more precisely to count progressions of length 
four in g. This will be second component of the proof of Theorem 14.11 and will require 
a certain amount of (fairly standard) analysis of the geometry, algebra, and Fourier- 
analytic structure of quadratic phase functions and their associated level sets. This will 
be the objective of the remaining sections of the paper. 

7. Affine quadratic geometry 

In order to analyze g, we first must understand the geometry of quadratic phase func- 
tions. From an algebraic perspective, at least, the structure of quadratic phase functions 
can be easily understood. Given a quadratic phase function <p on an affine space W, 

define the gradient V0 : W — > W and the Hessian V 2 : W — > W by requiring the 
Taylor expansion 

(j)( x + h) = <p{x) + V0(x) • h + ^V 2 <M • h (7.1) 
for all x G W and h aW . Indeed we have the explicit formulae 

V 2 0^i • h 2 = (j){x + h x + h 2 ) - <f>(x + hi) - <f>(x + h 2 ) + <f>(x) 
(note the right-hand side is in fact independent of x) and 

V(j){x) ■ h = ^(4>(x + h) - 4>(x - h)); 

here we exploit the hypothesis that |F| is odd in order to be able to divide by 2. 
Observe that V0 is linear and that V 2 is a self-adjoint linear transformation from W 

to W, which vanishes if and only if is linear (in which case V0(x) = V0 is constant); 
this combined with ( 17. ip shows that quadratic phase functions do indeed have the form 
4>{x) = u(Mx-x+£ i -x) + c as claimed earlier. Note that if one chooses a basis (ei, . . . , e n ) 
for F n then, in the associated coordinate system, takes the rather concrete form 

n 

4>(x 1 ,...,x n ) = v( ^2 MijXiXj + S ^r i x i ) + c. 

Furthermore, V 2 has a null space ker(V 2 </>), which is a linear subspace of W; observe 
that cf) becomes linear when restricted to any coset of this space. The codimension 
of this null space will be referred to as the rank rank(0) of cf). Intuitively, this rank 
measures how close the quadratic phase function is to being linear. 
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One can easily check (from (17. ip ) that a quadratic phase function takes at most \F\ 
values; indeed, after shifting the phase by a constant phase in K/Z we may assume 
that the quadratic phase function takes values in the discrete group T F := {x G R/Z : 
\F\x = 0}. Let us refer to such phase functions as discretized. Note that the space of 
discretized quadratic phase functions is itself a finite-dimensional vector space over F. 

We are now in a position to study the structure of quadratic factors, and in particular 
their atoms (which are nothing more than quadratic varieties). We begin with the 
classical result of Chevalley and Warning. 

Lemma 7.1 (Chevalley- Warning theorem). Let W be an affine space, and let B be a 
pure quadratic factor of complexity strictly less than dim(W)/2. Then every atom ofW 
has size a multiple of \F\. In particular, if an atom contains one point xq G W , then it 
must contain at least one further point in W . 

Proof. We may translate If to be a linear space, which we then identify with F n . 
Subtracting constant terms as appropriate, we may write any atom A of B as 

A = {x G F n : fa(x) = . . . = <f) d (x) = 0} 

for some discretized quadratic phases <f>i, . . . , fa : W — > T F and some d < n/2. Identi- 
fying Tp with F and writing fa, . . . , fa in coordinates, it follows that 

A = {x G F n : Q 1 (x) = ... = Q d (x) = 0} 

for some quadratic polynomials Qi, . . . , Q d : F n —>■ F. Modulo \F\, we thus have 

d 

\ a \ = e ri( i -^( a; ) |i?hi )( mod i F i)- ( 7 - 2 ) 

x£F n j=\ 

The product has degree at most 2c?(|F| — 1). Since d < n/2, we see after writing x = 
(x\, . . . , x n ) that none of the monomials in this product are multiples of x^' -1 . . . xjf ' _1 . 
The right-hand side of (17. 2\\ therefore vanishes, and we are done. □ 

We now apply this lemma to obtain large linear spaces inside quadratic varieties. We 
begin with a homogeneous statement. 

Lemma 7.2 (Quadratic forms have large null spaces). Let G be a vector space, and 
let Mi, . . . , Md be self-adjoint linear transformations from G to G. Then there exists a 
linear subspace W of G with 

1 2d 

dimW> _ dim(G) __ 

such that Mjx ■ y = for all j , 1 ^ j ^ d and for all x,y G W . 

Proof. Let If be a maximal linear subspace of G which is null with respect to all the 
Mj (i.e. MjX ■ y = for all j, 1 ^ j ^ d and for all x, y G W). Let W 1 - be the linear 
subspace 

W ± := {x E G : MjX ■ y = for all 1 ^ j < d and y G W} 
thus W ± D W. From linear algebra we also see that 

dim(Pf ± ) ^ dim(G) - ddim(W) 
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and thus after some algebra 

dim(VK) ^ — !— dim(G) - — dimQY^/W). 
d I X | 1 

Observe that the quadratic forms Qj(x) := Mjx-x are well-defined on W ± /W. The zero 
locus {x G W^-fW : Qj(x) = for all j, 1 ^ j ^ d} consists only of the origin, since 
otherwise we could extend W by one additional dimension and contradict maximality. 
In particular, the cardinality of this zero locus is not a multiple of \F\. Applying Lemma 
17.11 we conclude that dim(jy- L /jy) ^ 2d, and the claim follows. □ 

Corollary 7.3 (Linearization of quadratic factors). Let (81,82) be a quadratic factor 
on an affine space W of complexity at most {di,d 2 ). Then each atom of B 2 can be 
partitioned into disjoint affine spaces each of dimension at least 

dim{W) di + 2d 2 

— ; — — ; — do. 

d 2 + l d 2 + l 



Proof. By working on each atom of B\ separately, one sees that it suffices to verify 
this claim for pure quadratic factors. Thus we may take B\ to be trivial and B 2 = 
Bfa V ... V B^ for some quadratic phase functions <px, . . . , 4>d 2 - By the preceding lemma 
we can find a linear subspace W of W of dimension 

1 Orl 

dim W') ^ — — dun(W) - 

v 1 d 2 + l v ' d 2 + l 

which is null with respect to all of the V 2 0j. In particular, this implies that <f>\,..., 4>d 2 
are linear on each of the cosets of W in W. Thus one can refine each such coset W 
further into affine spaces of dimension at most dim(W) — d 2 , on which each of the 
4>\,... ,4>d 2 are constant. These spaces form a partition of the atoms of B 2 , and the 
claim follows. □ 

As a consequence of this, we see that a density increment on a quadratic factor implies 
a density increment on a subspace, albeit at the expense of reducing the domain of the 
density increment substantially. 

Corollary 7.4 (Linearization of quadratic density increment). Let f : W — > M. be 

a real-valued function on an affine space W , and let (Bi,B 2 ) be a quadratic factor of 
complexity at most (di, d 2 ). Let A be an atom of'B 2 . Then there exists an affine subspace 
W of W of dimension 

dim(iy') ^ — !— dim(W0 - d \ +2d2 _ d 2 
v 1 d 2 + l v ' d 2 + l 

such that E w ,{f) ^ E A {f). 



Proof. From the preceding corollary, we can write A as the disjoint union of affine 
spaces of dimension at least ^^-dim(W / ) — dl d ^f 2 — d 2 . The claim then follows from 
the pigeonhole principle. □ 

We now record a simple linear variant of this which will be used in 
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Lemma 7.5. Let f : W — > R be a real-valued function on an affine space W , and let B 
be a linear factor of complexity at most d. Then there exists an affine subspace W of 
W withdim(W) > dim(W)-d such thatE w ,(f) ^ E w (f) + §||E(/|B) -E w (f)\\ L x {w) . 

Proof. Suppose that g : W — > R is any function with mean zero. Then we have 
E(g + \g\) = Hflili, which implies that there is some x G W such that g{x) + \g{x)\ ^ 
|| g ||i. For such an x we clearly have g(x) ^ |IM|i- Applying this observation with 
g := E(/|£>) — E^(/), we see that there must exist x G W such that 

E(f\B)(x)-E w (f) > i||E(/|B) -E w (f)\\ LHW) . 

Letting W be the atom of B containing x, the claim follows. □ 

At this point we can already conclude a cheap version of Theorem 14.11 the density 
increment result: 

Proposition 7.6 (Cheap density increment for 4APs). Let W be an affine space, and 
let f : W — > R be a 1-bounded non-negative function. Set 5 := Ew(/). Suppose that 

\^w,w,w,w(f, f, f, f) ~ ^w,w,w,w(S, 5, 5, 5)| > <5 4 /2. 

Then there exists an affine subspace W of W satisfying the dimension bound 

dim(W) ^ (<5 4 /32) 2Co+1 dim(G) - (32/5 4 ) 2Co+1 

and such that we have the density increment 

Remark. An iteration of the above proposition gives a bound of the form 



r 4 (F n ) < A^e _c v /loglog|F| (7.3) 

for some absolute constant c > (recall that N := |-F| n ). We leave the verification 
of this to the reader, remarking that it is very similar to the deduction of Theorem 
11.11 from Theorem 14.11 as given in §HJ The bound (17.31) is better than the previously 
best-known result, (|1.5p . but substantially weaker than Theorem 11.11 It does enjoy the 
advantage of being easier to adapt to groups more general than F n : for details see |17j . 



Proof of Proposition \77b] We apply Corollary 16.71 to obtain a quadratic factor (23i,23 2 ) 
in W of complexity at most ((32/5 4 ) 3Co , (32/£ 4 ) 2Co ) such that the function g := E{ f\B 2 ) 
obeys 

\^w,w,w,w{9,9,9,9) - ^w,w,w,w(S,8,8,S)\ > <5 4 /4. 

We claim that H^U^oo^) ^ 5 + ^. For if this were not the case, then from Lemma 
we would have 

x 

\^w,w,wM9,9,9,9) -A WtW)WtW (6,6,5,S)\ < 4(5 + ^) 3 \\g - 8\\mw) 



and hence certainly 
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Since g — 5 has mean zero, we see (cf. the remarks at the beginning of the proof of 
Lemma [7.51) that the maximum value of g — 5 is at least ^, a contradiction. Thus we 
indeed have 

\mf\B 2 )\\ LOO(w) >5+ A 

In particular there exists an atom A of B 2 such that E^(/) ^ Ew(/) + ^. The claim 
now follows from Corollary 17.41 □ 

The above argument was very crude, as it relied on the rather low-technology estimate 
in Lemma 14.21 In particular, the quadratic structure of B 2 was not used, except in 
the final step of Corollary 17.41 to convert the density increment on an atom of B 2 to 
a density increment on a subspace. In the next section we refine these computations 
by exploiting some "mixing" properties of the quadratic factor to obtain some further 
concentration properties of g\ only after obtaining such properties do we invoke the 
(expensive) Corollary 17.41 



8. Quadratic mixing 

We have already seen the Chevalley- Warning theorem (Lemma 17.11) which gives some 
control on the size of quadratic atoms. It turns out that one can do substantially better 
than this if we assume a non- degeneracy condition on the quadratic phases which define 
the atom. 

It is convenient to work for now in a homogenized setting, returning to the affine setting 
later. 

Definition 8.1 (Homogeneity). Let W be a vector space (i.e. an affine space with a 
distinguished origin 0). A homogeneous quadratic phase function on W is a quadratic 
phase function <f> : W -> R/Z such that 0(0) = V0(O) = (thus <p(x) = \V 2 (j)x ■ x). A 
homogeneous linear phase function on W is a linear phase function <p : W — > M/Z such 
that (p(0) = (thus <p(x) = V0 • x). A homogenized quadratic factor with complexity 
(di, d 2 ) on W is any factor which is generated by d\ homogeneous linear phase functions 
and d 2 homogeneous quadratics, with these d\ + d 2 discretized phase functions being 
linearly independent over F. 

Note that any pure quadratic factor of complexity at most dona vector space W can 
be extended to a homogenized quadratic factor of complexity at most (d, d), simply by 
taking all the quadratic phases generating the original factor and breaking them up into 
homogeneous quadratic and homogeneous linear components (dropping the constant 
terms, which are not relevant), and then eliminating any linearly dependent terms. 

Now we define the rank of a homogenized quadratic factor. 

Definition 8.2 (Rank). Let B = £> 7l V ... V B Jdi V Bfa V ... V B^ be a homogenized 
quadratic factor of complexity (di,d 2 ) on a vector space W, generated by d\ homoge- 
neous linear phases 7, and d 2 homogeneous quadratic phases <pj. We define the rank of 
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the factor to be the minimal rank of 0, where ranges over all linear combinations 

where the Xj are elements of F, not all zero, of <f>\,... ,4>d 2 which are not identically 
zero. (If d 2 = 0, we define the rank to be infinite.) 

Intuitively, quadratic factors B with high rank are highly nonlinear, and as such have 
a certain amount of "mixing". In practise this means that many quantities involving 
immeasurable functions can be easily understood by working in configuration spaced 
T d p x T d p . If S = £> 7l V . . . V B ld V % V ... V B^ is a homogenized quadratic factor of 

complexity (g?i,g?2) on a linear space W then we write r : W -> T^ 1 and $ : W -> T^ 2 
for the maps := (71 (x), . . . , 7^ 1 (x)) and := (0i(x), . . . , (f>^ 2 (x)). If / : — ► C 
is a 1-bounded immeasurable function then we write f : T^ 1 x T^ 2 -> C for the function 
which satisfies 

/Or) = f(r(x),$(x)) 

for all x G IU and f(xi,x 2 ) = if (xi,x 2 ) 7^ (r(x), $(x)) for any@x G W. We will adopt 
this convention of using bold letters to denote functions on configuration space for the 
rest of the paper without further comment. 

One basic formulation of this principle is given in Lemma 18.41 below. Before we can 
prove it, we recall a well-known bound on the magnitude of Gauss sums. 

Lemma 8.3 (Gauss sums). Suppose that W is a linear space and that : W — > R/Z 
is a quadratic form with rank r. Then we have the estimate 

|E x e(0(x))| ^ \F\~ r l 2 . 

Remark. Note that the estimate is invariant under adding an arbitrary linear phase to 
the quadratic form <p. 

Proof. Write := |E x e(0(x))|. Squaring and changing variables, we have 

(% = % B , h e{<j>(x + h)-<l>{x))\. 
Using the Taylor expansion (17. ip and applying the triangle inequality, this gives 

G\ ^ E x \E h e(Vcf>(x) ■ h)\ = P,(V0(x) = 0) = |ke j^ 0)l = \F\~\ □ 

Lemma 8.4 (Expectation on quadratic factors). Let B be a homogenized quadratic 
factor of complexity {di,d 2 ) on a linear space W , with rank at least r. Let T, $ be 
the maps from W to configuration space Tp xT d F 2 . Let f : W -> C be a 1-bounded 



4 This configuration space can be viewed as a discrete finitary analogue of the 2-step nilmanifolds 
which arise naturally in the study of characteristic factors for the U 3 norm or four-term recurrence; 
see [H |21 ES]. 

5 The definition of f outside the range of T x <I> is made merely for definiteness. In later contexts, as 
the reader may check, r x $ : W — > Tl 1 x TP 2 will always be surjective. 
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B-measurable function, and let f be the corresponding function on configuration space. 
Then we have 

|E w (/)-E <lx<2 (f)| ^ \ F \(^-r)/2. 
Proof. We employ a Fourier expansion on configuration spac^E Writing 

this allows us to write 

Since f (0, 0) = E d x T d 2 (f), we conclude that 

E w (f) - E <lxT , 2 (f) = %,6)E, eW e(^ ■ r(ar) + 6 ■ $(*)). 

(6,C2)6^ dl x^ d2 \(o,o) 

Now from the rank hypotheses we see that £i • T(x) + £2 • $(#) either is a non-constant 
linear phase, or is a non-linear quadratic phase of rank at least r. In the former case, 
the expectation appearing above is zero, whereas in the latter case the expectation has 
magnitude at most |.F| _r//2 by Lemma [8. 31 Thus by the triangle inequality we have 

\E w (f) -E <lxT , 2 (f)| ^ \F\~ r / 2 

(Ci,6)G-F d i xF d 2 

By Cauchy-Schwarz and Plancherel we have 

E l%>^)KI^I (<il+d2)/2 ll f IL 2 (^x^) 

and the claim now follows from the 1-boundedness of f . □ 

We turn now to the somewhat more complicated task of counting 4-term arithmetic pro- 
gressions using the configuration space, beginning with a heuristic discussion. Suppose 
that fo, fx, f 2 , fs are £>- measurable functions, and that we wish to compute 

^w,w,w,w(fo, fl, /2> ,/3)- 

To see what one would expect to get, let us write fi(x) = ii(T(x), $(x)) as before, and 
expand 

3 

Aw,w,w,w{fo> fu h, h) = E XjheW Y[fi{T{x + ih),${x + ih)). 

i=0 

It is then natural to ask what the constraints are between the quantities T(x + ih) and 
$(x + ih). From the linearity of V and the quadratic nature of $ one can easily deduce 
the constraints 

T(x), T(x + h), T(x + 2h), F(x + 3h) are in arithmetic progression 



6 Note that is a d-dimensional vector space over F, and its Pontryagin dual is naturally identified 
with F d . We persist with the notation T^, 1 x T d p to help the reader, who should remember that any 
such vector space is being used to label atoms in a quadratic factor. 
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and 

- 3$(x + h) + 3$(x + 2h) - $(x + 3h) = 0. 

It turns out that if the rank r is sufficiently large then these are in some sense the "only" 
constraints, and furthermore there is a certain uniform distribution among all the values 
of T(x + ih) and $>(x+ih) obeying these constraints. This leads to the heuristic formula 

3 

&w,w,w,w(fo> /i' /a» /s) ~ E a . 1)/lie T d i I I f *( x i + ih U X %i) 

a;2,o,a;2, 1,^2,2,^2,36™/ 

2^2,0— 3X2,1+3X2,2— ^2,3=0 

which can be rearranged using the Fourier transform in the T d p variables as 
A*w,w,w,w{fo, fl, f2, fs) ~ 

^xiMer* 1 ^2 fo^^Ofi^i + h, -3£,)H x i + 2/ii,3^)f 3 (zi + 3/ii, -f) 

£SF d 2 

where f is the partial Fourier transform of f , 

f(xi,0 :=E a . 2eT d 2 f(a;i,X2)e(-C-X2). (8.1) 

Let us remark that these formulae are closely related to the computations on 2-step 
nilmanifolds in [1] . One can view as an "abelian extension" of the "Kronecker 

factor" T^ 1 , thus creating a discrete analogue of a 2-step nilmanifold. The above formula 
then is computing A by taking the Fourier transform in the abelian extension variable. 

The next lemma constitutes the rigorous version of the above heuristics. 

Lemma 8.5 (A on quadratic factors). Let B be a homogenized quadratic factor of 
complexity (di,^) on a linear space W, with rank at least r. Let T, $ be the maps 
from W to configuration space T d p x Yp 2 . For i = 0, 1, 2, 3, let fi(x) = fi(T(x), $(x)) be 
1-bounded B-measurable functions. Then we have 

\A-w,w,w,w{fo, fl, f2, f3) — 

E Xl Mefp W*utMxi + h,-%Mxi + 2h 1 ,3$f 3 {x 1 + 3h 1 ,-t)\ 

§GF d 2 

< |_P|(4rfi+4d 2 -r)/2 

(8.2) 

where fj is defined by (18. ip . 

Proof. We use the total Fourier expansion 

to obtain 

3 

^W,W,W,w(fo, /1,/2,/s) = 5^ m ( A iOlI^'^ ( 8 " 3 ) 

AG(F d i) 4 ,^G(F d 2)4 i=0 
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where A = (A , Ai, A 2 , A 3 ), f = (f , fi, 6, (a) and 

3 

m(A, := E x ^ w e{J2 Ai ■ T(x + z/i) + & • $(z + i/i)). (8.4) 

i=0 

Meanwhile, we may use the Fourier inversion identity 

= f (Ai, Oe(Ai • xi) 

Ai 

together with similar identities for fi(xx + hi, — 3£), f 2 (xi + 2/ti, 3£) and f 3 (xi + 3/ii, — £) 
to deduce the formula 

^.Aiei* E fo(^i,Ofi(^i + ^i ) -30f2(^i + 2/ i i,30f3(xi + 3/ il ,-0 

£eF d 2 

3 

e i S (A,on^^)' ( 8 - 5 ) 

Ae(F d i) 4 ,5e(F d 2)4 i=0 
where £ G (F dl ) 4 x (F da ) 4 is the set of all pairs (A, £) such that 

36 = -6 = 6 = "36 (8.6) 

and 

Ai + A 2 + A 3 + A 4 = A 2 + 2A 3 + 3A 4 = 0. (8.7) 
We will shortly show that 

|m(A,£)-l s (A,£)| <; \F\- r / 2 . (8.8) 
Assuming this, we can compare (18. 3ft with ( 18. 5ft . bounding the left-hand side of (18. 2ft by 

i^r r/2 E n 15(^6)1- 

Ae(F d i) 4 ,5e(F d 2)i i=0 

Applying Cauchy-Schwarz and Plancherel as in the proof of the preceding lemma, we 
can bound this by |_p|( 4d i+ 4d 2-r , )/2 ag desired. 

It remains to prove ( 18.81) . First suppose that ( 18.61) fails, so that ls(A, £) = 0. Then (by 
a simple inspection) we can find i' G {0, 1, 2, 3} such that Yli=o(^ ~ *') 2 6 7^ 0- We can 
use the change of variables x = y — i'h to write 

3 

m(A, := E yAeW e(J2 \ ■ T(y + (i - i')h) + & ■ <% + (i - 

i=0 

It then follows from the rank condition that the phase Yl^=o & ' ®{v + (* ~~ O^) contains 
a non-trivial quadratic component in ft, of rank at least r. Noting that the linear terms 
Ai • F(y + (i — i')h) do not affect the quadratic component of the phase, we conclude by 
averaging over h and applying Lemma [8.31 that m(A, £) does indeed have magnitude at 
most \F\~ r l 2 . 
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Now suppose that (18.61) holds, but (18.71) fails, so again 1 S (A,£) = 0. Then the quadratic 
nature of $ ensures that Y^=o ■ + ih) =0. Thus 

3 

m(A,0 = E XiheW eC^2\i -T(x + ih)) 

i=0 

= E X)ft6W e((A 1 + A 2 + A 3 + A 4 ) • T(x) + (A 2 + 2A 3 + 3A 4 ) • T(h)). 

The fact that at least one of the vectors Ai + A 2 + A3 + A4 and A 2 + 2A3 + 3A4 is non-zero, 
combined with the assumed linear independence of the phases 71, ... , 7^ comprising V, 
ensures that m(A, £) = 0. 

Finally, when (18. 6ft and (18.71) both hold we see that m(£, A) = 1 = ls(A, £) and the 
claim is trivial. □ 

Lemma [8.51 leads to the following density increment result. 

Theorem 8.6 (Anomalous AP4-count implies density increment). Let B be a homog- 
enized quadratic factor of complexity (di, o? 2 ) on a linear space W with rank at least r. 
Let fo,f\if2ifz '■ W [0, 1] be 1-bounded non-negative B-measurable functions which 
obey the estimates 

\Aw,w,wMfo, fi, h, h) - Eh«-(/o)E W (/i)E w (/ 2 )E„,-(/ 3 )| ^ 77 (8.9) 

and 

maxEw(fi) < 6r/ 1/4 (8.10) 

for some n, 1 ^ n ^ 2 40 |F| 6dl+6d2 ~ r . Then there exists i, ^ % ^ 3, such that one of 
the following two possibilities hold: 

• (medium-sized increment on large subspace) There exists an affine subspace W 
ofW with dimension satisfying 

dim(W') ^ dim(W) - d x 

such that we have the density increment 

E w ,(f i )^E w (f i )+2- 13 r ] 2 . 

• (large increment on medium-sized subspace) There exists a positive integer K ^ 
(16/ 77) 3 , and an affine subspace W ofW with dimension satisfying 

dim(W") ^ -J— dim(W) - 2(16/n) 3 - d x 
K + 1 

such that we have the density increment 

E wl (f l )>E w (f l ) + 2- w K 1 'S lli - 

Remarks. The constants such as 2 20 appearing here are not best possible. However, to 
remove the hypotheses on rank completely will require an additional argument which we 
present after proving this theorem. The density increment obtained here is somewhat 
better than that in Proposition 17.61 for when K is small we do not reduce the dimension 
of W by as much as in that proposition, and when K is large we increase the density 
on W by significantly more. 
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Proof. Let T, $ and fj be as before: recall that T(x) = (71 (x), . . . , 7^(2)), $(2) = 
(0i (x), . . . , </>d 2 (x)) and that fj is a immeasurable function such that 

f i (T(x),$(x)) = f i (x). 

Applying Lemma [8.51 we immediately deduce that 

\\ u hi&$ E fo^i, Ofi^i + h, + 2/ii, 3^)^(ari + 3/ii, -f ) 

- E w (fo)E w (f 1 )E w (f 2 )E w (f 3 )\ ^ 7^/2. (8.11) 

Now fj(xi,0) is close to the average of on the affine space r -1 (a;i). Indeed apply- 
ing Lemma 18.41 with / replaced by /lr-i( Xl ), so that the corresponding function on 
configuration space is f 1 „d 2 , we see that 

|fi(xi,0) -E x£r - Hxi) f t (x)\ ^ \ F \<&+*-*)/* <: 2~V- 
Hence if there is some % for which 

E^Jf^O) -E^/OI > 2- x V (8.12) 

then 

E ai6T £ \M xET -i( Xl )fi(x) - E w (fi)\ > 2~ 13 V 2 , 

or in other words 

\mf i \B^)-E w (f i )\\ L1{w) ^2- 13 Tf i , 

where By in is the linear factor of complexity d\ determined by the affine spaces r _1 (xi), 
Xi G T"^ 1 . Lemma 17.51 then tells us that there is some subspace W with dim(W // ) ^ 
dim(W / ) — di, such that 

E w ,(f l )^E w (f l ) + 2- 13 r ] 2 . 

In this case, then, we have a medium-sized density increment on a large subspace and 
are done. Suppose, henceforth, that (18.121) does not hold. 

Now from Lemma 14.21 we conclude 

K^etp* ^ 1 ' 0)fi(zi + h u 0)f 2 (xi + 2h u 0)f 3 (xi + 3h u 0) 

-E w (f )E w U\)E w U2)E w (f 3 )\ ^ V /A 
and hence, from (18.111) . it follows that we must have 
E a> lM eTp E \%(x 1 ,0f 1 (x 1 + h 1 ,-30%(x 1 + 2h 1 ,30%(x 1 + 3h 1 ,-Z)\>r 1 /4. (8.13) 

From Holder's inequality and a change of variables we conclude that there exists i, 
^ i ^ 3, such that 

E \i(*i,t)\ A >v/4. 

£eF d 2\o 

It is immediate from Plancherel's identity that 

E Sc^,oi 4 <i. 

£eF d 2\o 
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and so a simple averaging argument tells us that for a proportion at least rj/8 of the 
x\ G we have 

E fi(xi,0| 4 >V8- (8-14) 

Now we are assuming that (18.121) does not hold, that is to say 

E xi& %\t i (x 1 ,0)--E w (f i )\ < 2~ 1 V- 

It follows that the proportion of values of x\ G T"^ 1 for which it is not the case that 

|5(ar 1 ,0)-E w (/ < )| ^2~ 9 V (8.15) 
is less than rj/8. In particular, there is at least one value of X\ such that both (18.141) and 

( 18.151) are satisfied. Fix this x\, and write Fj(x 2 ) := fi{xi,Xo) and Fj(£) := fj(xi,£). 
Note that (I8.15P can be written in the form 

|F l (0)-E w (/ l )K2- 1 V (8.16) 

In particular, in view of ( 18.101) . we have 

|Fi(0)| ^8 V l /\ (8.17) 



At this point we employ some arguments very close to those of Heath-Brown [20J and 
Szemeredi [301. From Plancherel's theorem we have 



E |Fi(0l 2 O. 

and hence by ( 18.141) 

E 

feF d 2\0:|Fi(O| 2 ^/16 

Let us order the £ in this summation as ^1,^2, • • • , O in decreasing order of |Fj(£)|, so 
that J ^ 16 / 77 and 

By the pigeonhole principle and the fact that C(4/3) ^ 16, there exists K, 1 ^ K ^ J, 
such that 



Fix this fT, and set S := {£1, . . . , We clearly have 

EiF 4 (or^^iF,(^)r^v /2 ^ i/3 /i6. 

Thus S" has captured a significant amount of L 2 energy of F« in frequency space; we 
now look for a similar concentration of energy in physical space. Let S 1 C T d p be the 
orthogonal complement of S in T^ 2 , which is a linear subspace of T^ 2 of codimension 
at most \S\. Note that {0} U S C S 1 - 1 -^ From the Poisson summation formula and 
Plancherel we have 

E |F,(0| 2 = E eTd2 |E xec+s xF,(x)| 2 



1 F 
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and hence 



E^lE^xF^)! 2 > {F^ + rj^K^/m. 
Now from the positivity of Fj we have 

E^lE^.+sxF^x)! = E cgT d 2 E x . ec+5 xFi(x) = Fj(0), 

and hence there exists a coset c + S' of S such that 

E^^iF,^) > Fi(0) +r ? 1 / 2 ir 1 / 3 /16F,,(0). 

Using (18.161) and ( 18. 17ft it is a simple matter to conclude that 

E^+s^x) > E w {fi) + ^ 1/3 r/ 1 / 4 /256. (8.18) 

Now recall that F^x) = f»,(xi,x), and that fi(x) = fi(T(x), $(a?)). Thus (18TTH]) is 
asserting a density increment for /j on 

a : = r- 1 (x 1 )n$- 1 (c + ^ ± ). 

Note that A is a collection of ^-atoms, where £>s is the factor of complexity (d, K) 
generated by 71, . . . , 7^ and £1 <&. To quantify this density increment precisely 

we must apply Lemma Write g := fol A and note that if 

g(ti,t 2 ) := li 1=:Cl lt 2gc+s xfj(ti, t 2 ) 

then 

g{x) = g(r(x),$(x)). 
Applying Lemma 18.41 to the function g and noting that 

E T , lxT , 2 (g) = |F|- dl EF,l c+ 5^, 

we conclude that 

||F|-^EFa c+ 5^ -®w(fiU)\ ^ \F\^ +d ^ 2 . (8.19) 
Applying the same lemma to the function 1^, we also have 

\\F\- dl El c+s ± -E W 1 A \ «C |F| (dl+d2 - r > /2 . (8.20) 

Now we certainly have 

El c+5 x ^ \F\- d >; (8.21) 
this together with ( 18.20P and our assumption on r implies that 

E W 1 A ^ i|F|" d2 . (8.22) 

Combining (jHZESD with <K7I\i gives 

\F\ d ^E w (fl A ) 



E x£c+s ±Fi(x) 



El c+S 

whilst (Km and (EZZD together yield 

1 



< |^|(3di+3d 2 -r)/2 



^ (E W 1 A )(E1 C+S x) ^ 1 1 



El c+5 x E W 1 A 

Combining these last two inequalities and recalling our assumption on the relation 
between 77 and r, we obtain 

lE^+sxF.Or) -E A (ft)\ < 31^1(3^+5^-^/2 ^ 2 -i8^_ 
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Inserting this into (18.181) we conclude that 

E A (f i )>E w (f l ) + 2- 10 K 1 / 3 r ] 1/4 - 
In particular there is some atom A' in the factor B$ such that 

E A ,(f i )>E w (f i )+2- i0 K 1 f*f. 

Applying Corollary 17.41 to the factor Bs, we can then find an affine subspace W of W 
with dimension satisfying 

dim(W") > J^Tl dim W - ~K~^\ d[m ( W ) - 2 ( 16 A?) 3 - di 

such that 

E w ,(f l )>E w (f l ) + 2- w K 1 / 3 V 1/4 - 
The claim follows. □ 

The last result was proved under two assumptions, namely homogeneity and a rank 
condition. We now remove these hypotheseses. To remove the latter we first need a 
lemma. 

Lemma 8.7 (Rank lemma). Let G be a linear space, and let M 1; . . . , M d : G — > G be 
self-adjoint linear transformations. Let r ^ be arbitrary. Then there exists a linear 
subspace H of G of codimension at most rd, and self- adjoint linear transformations 
M[, . . . , M' d , : H — > H for some ^ d! ^ d, such that we have the rank condition 

rank(aiM( + . . . + a d ,M' d ,) > r (8.23) 

whenever (ai, . . . , <v) e F d \ {0}. In particular M[, . . . , M' d are linearly independent. 
Moreover, interpreting Mi\g as a map into H, we have M%\h £ (M[, . . . , M' d ) for each 
i. 

Proof. We induct on d. The case d — is vacuously true, so suppose d ^ and the 
claim has already been proven for d — 1. We may assume that 

rank^Mi + . . . + a d M d ) ^ r 

for some a±, . . . ,a d not all zero, since otherwise we could just set d! = d, H = G, and 
Mj = My By symmetry and scaling we can take a d = 1. If we then let G' be the 
kernel of a\M\ + . . . + a d M d then G' has codimension at most r, and when restricted 
to G' the transformation M d is a linear combination of the Mi, . . . , M d _i and can thus 
be safely omitted. The claim then follows from applying the induction hypothesis to G' 
and Mi,...,M d _i. □ 

Theorem 8.8 (Anomalous AP4-count implies density increment, II). Let Wq,Wi, W2, 
W3 be a progression of affine spaces, and on each Wi let Bi be a pure quadratic factor 
of complexity at most d, and let /j : Wi — > 1R + be 1-bounded non-negative Bi-measurable 
functions which obey the estimates 

\^w ,wi,w 2 m(hi h, /a, h) ~ E Wo (f )E Wl (fi)E W2 (f 2 )E Ws (f 3 )\ > 77 (8.24) 



and 



maxE w .(fi) OTy 1 / 4 (8.25) 

0<i<3 
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for some rj, < rj ^ 1. Then there exists i, ^ i ^ 3, such that one of the following 
two alternatives holds: 

• (medium-sized increment on large subspace) There exists an affine subspace W[ 
of Wi with dimension satisfying 

2 42 

dim(W') > dim(Wi) - 200d 2 - 8dlog| F , — 

i i ^ 

such that we have the density increment 

E w ,(f t )>E Wi (f l )+2- 23 n 3 . 

• (Large increment on a medium-sized subspace) There exists a positive integer 
K ^ (64/r/) 3 , and an affine subspace W[ ofWi with dimension satisfying 

1 2 42 

dim(W') ^ — dim(Wi) - 2(<oA/nf - 200d 2 - %d\og m — 

K + 1 1 1 T] 

such that we have the density increment 

EwyC/O >Ewi(/ 4 ) + 2~ 13 K 1 /V /4 . 

Proof. By translating each of the Wj (and also /, and Bi) we may assume that 
W = Wx = W 2 = W 3 = G is a vector space. Let B = B V Si V B 2 V B 3 , so that B is 
a quadratic factor generated by 4d quadratic phases <j>i, . . . , 4( ^ (adding dummy phases 
if necessary). Let r be the integer part of 50d + log| F | — ■ We use Lemma I8~T1 to find a 
linear subspace H of G of dimension at least 

2 42 

dim(if) > dim(G) - Adr ^ dim(G) - 200d 2 - 4d\og lFl — 

and self-adjoint matrices M[,...,M# for some d', ^ d' ^ 4d, obeying the rank 
condition (I8.23p . such that each of the Hessians V 2 0i, . . . , V 2 04d when restricted to H 
becomes a linear combination of the M[, . . . , M' d ,. 

Let £>' be the linear factor generated by the cosets of H. We may assume that 

MfilB') -M/OIUhg) < 2" 2 V; (8.26) 

for each i: if not then Lemma 1731 provides the claimed medium-sized density increment 
on a large subspace and we are done. 

Assuming then that (18.261) holds, it follows from Lemma 14.21 that 

|Aga6;g(E(/o|B0,E(/x|B0,E(/ 2 |B0,E(/ 3 |B0)-Eg(/o)E g (/ 1 )E g (/ 2 )E g (/ 3 )| ^ v /2, 
and hence by (18.241) that 

|A GjGAG (/ ,/i ,/ 2 ,/ 3 ) - A G , G ,GMfo\B')Mfi\13')Mf2\B')Mfc\13'))\ > rj/2. 
We can rewrite the left-hand side here as 

\^x,h£G/H (A* x +H,x+h+H,x+2h+H,x+3h+H{fa, fl, ^2, fs) 

(/o)E x+h+H +2h+H (/a)) |, 
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so for a proportion at least r\/A of the pairs (x,h) in G/H x G/H we have 

|A a . + // iX+ / l+ H iX+ 2/ l +_f/,z+3/i+//(/o, hi f2i h) 

-E x+H (f )E x+h+H (h)E x+2h+H (f 2 )E x+3h+H (f 3 )\ > n/A. 

(8.27) 

Now there must be a pair (x, h) satisfying (18.271) . and such that 

K + i h+H (fi)-Mfi)\ <2- 1 Vfori = 0,l,2,3. (8.28) 
Indeed if there were not then for some i we would have 

\E x+lh+H (h)-E G (h)\>2- 18 r ] 2 

for a proportion at least rj/lQ of the pairs (x, h) G G/H x G/H. This would be contrary 
to (18.261) . We now take advantage of the affine invariance of our setup by translating 
each of the h so that x = h = 0, so that (18.271) becomes 

I Ah,//,h,h(/o, fx, h, h) ~ Eh(/o)Eh(/i)Eh(/ 2 )E // (/ 3 ) | ^ »?/4. 

Now observe from (17. ip that each of the phase functions <f>i, . . . ,4>4d, when restricted 
to H, is equal to a linear combination of the homogeneous quadratic phases M[x ■ 
x, . . . , M' d ,x ■ x plus a homogeneous linear phase, plus a constant. By collecting all these 
homogeneous quadratic and linear phases together, and omitting any which are linearly 
dependent (note that the rank condition on the M[ ensure that the quadratic phases 
have no such linear dependence) we can thus find a homogenized quadratic factor B on 
H of complexity at most (Ad, d') which has rank greater than r, such that all the phases 
0i> • • • > 04d are ^-measurable on H. In particular f , h, / 2 , /3 are also B- measurable. 
The claim now follows from Theorem 18.61 (with W replaced by H, and rj replaced by 
r?/4) and (Km . □ 

We are finally able to prove Theorem 14.11 (and thus, by the argument at the start of §U 
Theorem II. ip . 



Proof of Theorem \4-l\ We apply Corollary 16.71 to obtain a quadratic factor (i3i,i3 2 ) 
in W of complexity at most (d 1 ,d 2 ) := ((32/5 4 ) 3Co , (32/5 4 ) 2Co ) such that the function 
g:=E(f\B 2 ) obeys 

\Aw,w,w,w(g,g,g,g) - A w ,w,w,w(S,S,8,5)\ > 5 4 /A. 
We may assume that 

\\E(f\B 1 )-5\\ LHW) ^2^ 2 5 1 \ (8.29) 

since otherwise the claim would follow from Lemma 17.51 In particular we see from 
Lemma 14.21 that 

and thus 

\Aw,w,w,w(9, 9, 9, 9) - Aw,w,wMW\Bi)Mf\Bi)Mf\Bi)Mf\Bi))\ > ^/8. 
We can rewrite the left-hand side as 

\^Wo,Wi,W2,W3(Aw ,Wi,W2,W 3 i.9y 9-, 9-, 9) — E W3 (/)E w - 1 (/)E l y 2 (/)Evy 3 (/))|, 
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where the Wq, Wi, W2, W3 range over all quadruples of atoms of B\ in arithmetic pro- 
gression. For a proportion at least <5 4 /16 of these progressions we have 

\^w ,Wi,w 2 ,w 3 (g,9,9,9) - E W / (/)E l y 1 (/)E H / 2 (/)E W / 3 (/)| ^ 5 4 /16. 

By a simple averaging argument using (18.291) we can find a progression W , Wi, W 2 , W 3 
with this property such that 

\E Wi (f)-E w (f)\^2~ 36 5 12 
for all i — 0, 1, 2, 3. The claim now follows from Theorem 18.81 with rj := 5 A /16. □ 
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